Estimation and control in finite Markov decision processes with the average reward criterion
نویسندگان
چکیده
منابع مشابه
Bounded Parameter Markov Decision Processes with Average Reward Criterion
Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, the notion of an optimal policy for a BMDP is not entirely straightforward. We consider two notions of optimality based on optimistic and pessimistic criteria. These have been analyzed for discounted BMDPs. Here we pro...
متن کاملAverage-Reward Decentralized Markov Decision Processes
Formal analysis of decentralized decision making has become a thriving research area in recent years, producing a number of multi-agent extensions of Markov decision processes. While much of the work has focused on optimizing discounted cumulative reward, optimizing average reward is sometimes a more suitable criterion. We formalize a class of such problems and analyze its characteristics, show...
متن کاملFuzzy Decision Processes with an Average Reward Criterion
As the same framework of Fuzzy decision processes with the discounted case we will specify an average fuzzy criterion model and develop its optimization by “fuzzy max order” under appropriate conditions. The average reward is characterized, by introducing a relative value function, as a unique solution of the associated equation. Also we derive the optimality equation using the “vanishing disco...
متن کاملOPTIMAL CONTROL OF AVERAGE REWARD MARKOV DECISION PROCESSES ' CONSTRAINED CONTINUOUS - TIME FINITE Eugene
The paper studies optimization of average-reward continuous-time finite state and action Markov Decision Processes with multiple criteria and constraints. Under the standard unichain assumption, we prove the existence of optimal K-switching strategies for feasible problems with K constraints. For switching randomized strategies, the decisions depend on the current state and the the time spent i...
متن کاملPseudometrics for State Aggregation in Average Reward Markov Decision Processes
We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applicationes Mathematicae
سال: 2004
ISSN: 1233-7234,1730-6280
DOI: 10.4064/am31-2-1